About the data

This is data is the OSIRIS from the Bureau van Dijk OSIRIS Database, which contains comprehensive financial and ownership information on public companies, banks, and insurance companies globally. It provides standardised and “as reported” financials, earnings estimates, ownership data, and news. Visit the link in the References section to learn more about the variables reported in this data.

The data covers 2018-2021, which overlaps with the pandemic. The subset provided is on Indonesian companies, in four separate files: osiris_Indonesia_2018.rda, osiris_Indonesia_2019.rda, osiris_Indonesia_2020.rda, osiris_Indonesia_2021.rda.

The the primary question motivating the analysis is “How did the pandemic affect corporate financials in Indonesia?”

Steps to take

  1. Take a glimpse of the data
  2. Decide which variables to focus on
  3. Combine the data sets, and select only the variables being used
  4. Check the availability of data, the missing value patterns
  5. Handle missings
  6. Make few plots of the data to understand the scope of the data and the effect of the pandemic
  7. Check some unusual patterns with visual inference
  8. Fit a model to refine the view to focus on the main question
  9. Diagnose the model fit
  10. Refine a data visualisation to communicate results

Setup

Work along with me…

  1. Open RStudio and create a project, in a new directory/folder for this workshop, e.g. bigdata. This will create a folder called bigdata and a project file bigdata.Rproj.
  2. Download the zip file, that contains the data, and code to ge started. Unzip it into the bigdata directory. It will create the sub-directories data which will contain the data, and some starting files.
  3. Open the file sandbox.R. This is where we will write and experiment with code for the analysis. The purpose of first working with R scripts is to get the code working in a satisfactory manner, before embedding it into a quarto document, that can be re-built with new data and summarises the work.
  4. Use the starter code in sandbox.R to read the data and take a quick look.

Resources